IBIS Macromodel Task Group

Meeting date: 31 March 2020

Members (asterisk for those attending):
Achronix Semiconductor      * Hansel Dsilva
ANSYS:                      * Curtis Clark
                            * Wei-hsing Huang
Cadence Design Systems:     * Ambrish Varma
                              Ken Willis
                            * Jared James
Intel:                      * Michael Mirmak
Keysight Technologies:      * Fangyi Rao
                              Radek Biernacki
                              Ming Yan
                            * Todd Bermensolo
Marvell                       Steve Parker
Mentor, A Siemens Business: * Arpad Muranyi
Micron Technology:          * Randy Wolff
                            * Justin Butterfield
SiSoft (Mathworks):         * Walter Katz
                              Mike LaBonte
Teraspeed Labs:             * Bob Ross

The meeting was led by Arpad Muranyi.  Curtis Clark took the minutes.

--------------------------------------------------------------------------------
Opens:

- New Attendee:  Jared James of Cadence introduced himself.  He said he had
  worked with AMI models for many years.  He had worked in Cadence's IP group,
  where he developed AMI models.  He now works with Ambrish and Ken on the tool
  development side.

-------------
Review of ARs:

- Hansel to send an email to ATM about his step vs. pulse response question.
  - Done.  The email was sent to ATM shortly before this meeting.
  
--------------------------
Call for patent disclosure:

- None.

-------------------------
Review of Meeting Minutes:

Arpad asked for any comments or corrections to the minutes of the March 24
meeting.  Walter moved to approve the minutes.  Randy seconded the motion.
There were no objections.

-------------
New Discussion:

Discussion on "Gap in IBIS for sampling with statistical mode AMI models":
Hansel reported that he is progressing on an initial BIRD draft.  He is waiting
for feedback from collaborators and hopes to have a presentation for ATM in two
weeks.

DDR Clock Forwarding:
Arpad recapped the discussion from last week and noted that a major topic was
whether a new function signature (AMI_GetWave2()) was required or use of the
clock_times array as an input was sufficient.

Fangyi shared a "Clock Forwarding BIRD Discussion" presentation he had created
to advance the discussion.

slide 2: Effects that Can Be Modeled by GetWave2
- With the DQS waveform available, one can model:
  - clock forwarding and DQ-DQS correlated jitter tracking
  - DQ slicer sensitivity to DQS slew rate
  - physical model of DQ Rx phase interpolator (PI)
    - nonlinearity and discretization in PI output delay
    - DQS jitter amplification by the PI
  - DQS correlated voltage noise on PI, slicer and DFE

slide 3: PI Output Delay Nonlinearity and Discretization
Fangyi noted that the "Ideal Output" figure shows the output waveforms for each
value of n from 0 to N.  A newly introduced figure demonstrates the non-uniform
spacing of the crossing delays as n varies from 0 to N in the Ideal Input
waveform case.  Fangyi said that delay nonlinearity and discretization can only
be captured by a physical model of the PI, and this is only possible with the
full DQS waveform.

Michael asked what defines the value of N.  Fangyi said it is a design parameter
that is fixed.  Walter explained that a typical value of N is 32, and it is
equivalent to a rotator with 32 taps.  If the UI were 160ps, for example, then
nominally each tap is separated by 5ps.  One design for a PI might be a phase
lock loop (PLL) running at a high frequency with 32 taps, and this would be
quite linear.  The PI we are discussing here is not a PLL, however, and this
method of generating phase interpolation and moving the zero crossing has
nonlinearities.  Fangyi said that if you apply a low pass filter (LPF) after
the PI output, then the output waveform will be smoother.  This is shown in the
"Output after LPF" figure, and it improves the linearity of the delay as a
function of n.

slide 4: DQS Jitter Amplification by the PI
This slide demonstrates an example of jitter amplification.  Given a case with
tau1 = 0, tau2 = .5*UI (90 degrees), N=32, and no LPF, a 10% DCD on the input
DQS waveform results in an Output DQS DCD that is greater than or equal to 10%
and varies with n.  The max value of the Output DCD is approximately 40% larger
than the input DCD and occurs at n=16.  The use of an LPF would increase the
amplification.  The effect can't be modeled without the DQS waveform.

slide 5: DQS Correlated Voltage Noise
Fangyi noted that DQ and DQS voltage noise can be correlated.  If this effect is
not considered, the eye width can be underestimated by as much as 10%.  The full
DQS waveform is needed to model voltage noise effects on the PI, slicer and
hence the DFE.

slide 6: GetWave2 vs GetWave
This slide presents a table of the six important effects listed in slide 2.
The table states that GetWave2 can be used to model all of them, while GetWave
with clock_times as an input can only handle one of them.  GetWave with an
internal CDR in the model cannot capture any of the effects.

                                                    GW with       GW with
       Effect to Model                        GW2   clock_times   internal CDR
-------------------------------------------   ---   -----------   ------------
Clock forwarding and DQ-DQS Jitter Tracking   Yes      Yes           No
DQ slicer sensitivity to slew rate            Yes      No            No
Physical model of PI                          Yes      No            No
PI output nonlinearity and discretization     Yes      No            No
DQS correlated voltage noise                  Yes      No            No
DQS jitter amplification by the PI            Yes      No            No

Fangyi said that these factors can critically affect system performance, and
that several IC vendors specifically requested that these effects be modeled.

Fangyi noted that GetWave is still supported, and GetWave and GetWave2 can
coexist.

slide 7: Simulation Flow Complexity
Fangyi said that some had objected to possible complication of the flow.  This
slide shows a system block diagram and defines 3 steps:
1. Compute analog channel output according to current flow (with crosstalk)
2. Compute the output of all DQS Rx DLLs.
3. Compute the output of all DQ Rx DLLs.

Step three takes the output of step 2 as an input, but Fangyi noted that all
three steps exist in the current flow.  The only new detail is that step 2 must
be done prior to step 3.  Fangyi said he thought this flow was actually simpler
than the GetWave with clock_times as an input flow, because that flow requires
the EDA tool to extract clock ticks from the DQS waveforms.

slide 8: Summary
Fangyi said that GetWave2 will address critical DDR5 modeling requirements that
GetWave cannot.  He said that no technical objections to the GetWave2's
capabilities had been raised.  He noted that ATM members had collaborated
successfully on DDR5 issues recently with the DC_Offset BIRD (BIRD197.7), and
he asked that we do it again.

Arpad recalled that one of the primary questions from the previous meeting was:
What are the magnitudes of these effects that GetWave2 can model?  He asked if
Fangyi could quantify the magnitude of these effects, as opposed to "yes" or
"no" entries in the table on slide 6.  He asked if there was a way to better
quantify what we would lose by not having GetWave2.  Could we quantify the
impact on predicted eye height, or width, or predicted BER, etc.?  Fangyi said
that system designers typically look at their timing margin with respect to a
mask specified at a given BER, for example 1e-16.  He said that if you look at
that margin without considering jitter amplification by the PI, you could over-
estimate the margin by up to 40% of the DCD on DQS.  Similarly, if you looked
at the margin without considering the correlation of DQ and DQS voltage noise,
you could underestimate the timing margin by 10% of the eye width.

Michael noted that he expected to have more feedback from an internal review at
the next meeting.  He asked Fangyi if all of the effects Fangyi had enumerated
meant that statistical modeling flow would be impossible because it wouldn't be
accurate enough.  He said that one common approach in model development was to
perform low-level time domain simulations and then quantify and capture effects
in parameters that could be used in a statistical simulation.  Fangyi said that
nothing he was describing precludes that.  He said that GetWave2 would allow the
time domain flow simulations to more accurately model the effects and capture
them as parameter values for a statistical flow.

Arpad noted that we had delayed a scheduled straw poll on whether to submit this
BIRD to the Open Forum.  Walter requested that we delay the poll for another
week.  Walter said his team had also researched these various effects and he
would have a presentation on an alternate proposal at next week's meeting.
Michael said he would have more information next week.  Ambrish said they would
take this up with their IP group as well and see if they thought there is a
problem here that needs to be addressed.

Randy asked Fangyi if the fact that this flow removes the internal CDR from
GetWave means that we have to further discuss the simulation flow for training
strobe timing with data.  Fangyi said this is a modeling issue, not a simulation
flow issue.  Randy asked if we need some setup in simulation to get the strobe
input to GetWave2 with the proper timing.  Fangyi said if the Rx were for the
controller, then the model contains the PI and can take care of the alignment.
If the Rx were on the DRAM side, you can do a write-leveling training and set
the skew.

To elaborate on Randy's question, Walter noted that this discussion had been
about DDR reads, where the memory is writing with a well-defined DQ-DQS skew and
the controller (the Rx model in this case) has a PI to get the clock adjusted
to the correct location.  In the case of writes, the DRAM is the Rx and there is
no PI.  There are different delays for the DQ to the latch and the DQS to the
latch.  Memory knows what those delays are, but for a wide bus simulation the
controller has to add the correct skew for each DQ.  What flow would we use to
get the correct controller skew?  That could be some iterative process, and it's
not a back-channel process because when you change the skew the cross talk moves
around so you have to redo the simulation.  That flow can be complicated, and
that's the one Randy was asking about.  Randy agreed.

Walter said the DDR model could choose to implement GetWave2, or it could use
the existing GetWave and do the clock generation internally.  You can continue
to use the internal CDR GetWave, but for a wide bus full handling of bit-by-bit
DQ-DQS interaction we need a new flow.  IBIS could leave it to the EDA tools
to decide.  An EDA tool could choose to run 32 simulations to sweep the skew,
for example.  That's a simple, but time-consuming flow.

Fangyi agreed that, if solving this from the EDA tool side, the tool could sweep
different values of skew.  Alternatively, the EDA tool could do a two step
approach.  The first step would be a simulation for the write leveling training,
and the tool could monitor the DQ and DQS waveforms at the DQ input and estimate
the skew.  Then, as step 2, it could use that skew estimate and run another
simulation.

Fangyi said you could also address this on the modeling side.  The DRAM Rx model
could use GetWave2 and do skew adjustment internally.  This would be building
some of the controller's functionality into the DRAM model, but that's an
acceptable approach.  Sometimes similar overlap is done in SerDes models.

Randy said that we need some way to coordinate between what the models will do
and what the EDA tools will do.

Todd noted that in GetWave models with an internal CDR, the CDR model tends to
provide some inherent low frequency jitter rejection, and only some high
frequency jitter gets through.  He said this behavior will be incorrect for
clock-forwarding applications, unless there was very low skew between DQ and
DQS.  So, the inherent behavior of an internal CDR model will result in
optimistic estimates in a clock-forwarding application.

Jared noted SerDes models may contain a PI.  It's typically an ideal model that
might not cover some of the nonlinearities described in slide 2.  He asked if
the purpose of the new GetWave2 was to model the behavior of the DQS path.
Fangyi said providing a physical model of the PI was one of the benefits of
GetWave2.

Arpad asked Fangyi to send the presentation to the ATM list.  Fangyi agreed.

BIRD201 Back-channel Statistical Optimization:
Walter noted that this had been submitted to the Open Forum and open for
discussion for several months.  He had not yet received any feedback, and at the
next Open Forum meeting he planned to move to schedule a vote.

- Curtis: Motion to adjourn.
- Randy: Second.
- Arpad: Thank you all for joining.

AR: Fangyi to send his "Clock Forwarding BIRD Discussion" presentation to the
    ATM list.

-------------
Next meeting: 07 April 2020 12:00pm PT
-------------

IBIS Interconnect SPICE Wish List:

1) Simulator directives